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Figure  1:  TAE&PD  component’s  and  data  flow  interaction  diagram. 


The  following  is  a  brief  status  report  regarding  the  TAE&PD  project  current  initiatives 
before  implementation.  It  is  crucial  beforehand  to  prepare  data  accordingly  as  some  attributes 
may  not  be  suited  for  DM  algorithms  being  considered.  In  this  project  we  look  to  demonstrate 
the  use  of  tools  and  techniques  that  are  applied  in  KDDM  related  fields: 


•  Databases  •  Machine  Learning 

•  Statistics  •  Visualization 


The  following  table  presents  the  current  scope  of  project  pertaining  to  KDDM  field  tools  and 
techniques  evaluation  status  that  are  still  being  researched  before  development  of  testing 
environment: 


KDDM  Field 

Tool(s) 

Technique(s) 

Status 

Databases 

Microsoft  SQL 

Server  2008  R2 

Data  Preprocessing 

•  Cleaning 

•  Integration 

•  Transformation 

•  Reduction 

On  going 

•  New  incident 
attributes  (Integration) 
are  being  considered 
from  other  sources 
that  may  add  more 
value  to  study. 

•  Dataset  contains 
incidents  from  2004  to 
2011.  WITS  site  is 
offline  for  some  time, 
no  2012  data  can  be 
collected  for  the 

moment. 

Machine  Learning 

Microsoft  SQL 

Server  2008  R2  Data 
Mining  Add-ins. 

•  Clustering  Analysis 

Afghanistan  incident 
type  volume 
evaluation  by 
country  regions. 

In  progress 

•  Researching 

Acquired  book 

Data  Mining  with 
Microsoft  SQL 
Server  2008 

Statistics 

Microsoft  SQL 

Server  2008  R2  Data 
Mining  Add-ins. 

•  Time  Series  Analysis 

Prediction  of  2012 
future  incidents 
regarding  deaths, 
injuries  and 
kidnappings 

Explore  algorithms 
provided  by  tool. 

Not  started 

•  Researching 

Acquired  book 

Data  Mining  with 
Microsoft  SQL 
Server  2008 

Machine  Learning 

R  language/  Rattle 

•  Association  Rules 

Analysis 

Create  rules 
associating  incident 
month,  week  and 
province  to  type  of 
attack. 

Explore  algorithms 
provided  by  tool. 

In  progress 

•  Researching 

Acquired  book 

Data  Mining  with 
Rattle  and  R  () 

Visualization 

R  language/  Rattle 

•  Graph  representation  of 

Not  started 

trends  of  incident  data 
via  the  R  language. 

•  Researching 

Security  Forces 

R  packages 

(Police  and  Military) 

incident  victim 

~ RODBC 

status  by  province  or 
other  factors. 

~  ggplot2 

~  arules 

~  RStat 

Table  1:  TAE&PD  project’s  current  scope  based  on  KDDM  associated  fields.  S 


The  table  above  represents  an  initial  blue  print  of  the  project  in  order  to  establish  a 
defined  scope. 

After  evaluating  software  used  for  data  analysis  the  likes  of  R,  Weka  and  Rapidminer  it 
was  concluded  that  the  R  language  is  the  best  fit  for  this  project.  R  has  the  capability  to  be 
customized  to  the  needs  of  any  user  and  as  far  as  DM  use,  it  can  be  used  by  people  with  or 
without  a  programming  background.  R  also  has  the  advantage  of  a  vast  community  of  resources 
in  comparison  to  other  DM  suites. 

In  the  months  of  mid  May,  June  and  beginning  of  July  of  2012  an  effort  has  being  made 
to  acquire  material  to  dive  in  R  and  its  KDDM  capabilities.  R  topics,  to  name  a  few,  that  have 
being  studied  include: 

•  Basic  Operations 

•  Function(s) 

•  Introduction  to  Data  Structures 

Arrays 

Lists 

Data  Frames 

•  Basic  Charts  and  Graphs 

•  R  Environment  Creation 

•  Rattle  and  Data  Mining 

•  Evaluation  of  Sampling  Strategy 

Training  Dataset 
Validation  Dataset 
Testing  Dataset 


In  the  coming  month  an  effort  will  be  made  to  start  experimenting  with  Cluster  Analysis 
method  using  SQL  Server  and  Association  Rules  Analysis.  Also  experiment  with  the  RODBC 


package  to  be  able  to  interact  directly  with  the  TID  database  in  SQL  Server  from  R  in  order  to 
present  the  use  of  the  ggplot2  package  for  visualization. 


